An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile
نویسندگان
چکیده
RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment.
منابع مشابه
طراحی یک مدل مبتنی بر شبکههای عصبی برای شناسایی و تجزیه و تحلیل الگوهای غیرطبیعی در نمودارهای کنترل فرآیند
Neural networks because of their abilities are used to patterns recognition. In statistical process control charts, a common cause variation distort expected form of unnatural patterns and so detection of assignable causes efficiently and precisely in a real-time is difficult. Therefore it would be logical to propose models based neural networks for recognition and analysis of patterns in proce...
متن کاملP-121: Cloning and Expression of The Inosine Triphosphate Pyrophosphatase Gene Variant II in E.coli
Background Environmental and cellular inappropriate conditions can cause damages to cells nucleotide poll. Deamination and oxidation damages interfere with cell�s vital reactions. Inosine triphosphate pyrophosphatase (ITPA), an evolutionary conserved enzyme, plays a critical role in elimination of non-canonical bases. In human genome, the ITPA gene is located on chromosome 20 short arm and tran...
متن کاملComments on Nonfinite Adverbial Patterns in English Prose Fiction: A Simple Model for Analysis and Use
This study aims to present an accessible model of some frequent nonfinite adverbial types occurring in English prose fiction. As its main syntactic argument, it recognizes that these adverbials are mostly elliptical in that there are some dependent-clause markers one can assume to be implicit when supplying those elements back into the clause complex. Some comments are provided at the end on th...
متن کاملCloning and expression of fragment of the rabies virus nucleoprotein gene in Escherichia coli and evaluation of antigenicity of the expression product
Rabies virus nucleoprotein (N protein) encapsidates genomic RNA of the virus and forms the viral ribonucleoprotein complex. These N proteins represent highly organized structures which activate proliferation of B cells and production antibodies against the N protein. In addition to the B cell, the rabies virus N protein has been shown to induce potent T helper cell responses resulting in a long...
متن کاملStrategies and Clinical Applications of Next Generation Sequencing
Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput sequencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 24 شماره
صفحات -
تاریخ انتشار 2017